Dry and wet interfaces: Influence of solvent particles on molecular recognition 
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We present a coarse-grained lattice model to study the influence of water on the recognition 
process of two rigid proteins. The basic model is formulated in terms of the hydrophobic effect. 
We then investigate several modifications of our basic model showing that the selectivity of the 
recognition process can be enhanced by considering the explicit influence of single solvent particles. 
When the number of cavities at the interface of a protein-protein complex is flxed as an intrinsic 
geometric constraint, there typically exists a characteristic fraction that should be flUed with water 
molecules such that the selectivity exhibits a maximum. In addition the optimum fraction depends 
on the hydrophobicity of the interface so that one has to distinguish between dry and wet interfaces. 

PACS numbers: 87.15.A, 87.15.-v ,89.20.-a 



I. INTRODUCTION 

Molecular recognition denotes the ability of a certain 
biomolecule to find the right partner molecule in an 
heterogeneous environment, such that the formed com- 
plex can perform its assigned biological task. Promi- 
nent examples of specific recognition processes between 
proteins comprise enzyme-substrate binding, antigen- 
antibody binding or protein-receptor interactions [l|, I4I . 
It is a remarkable property of recognition processes that a 
biomolecule (called probe molecule throughout this arti- 
cle) can identify its " correct" complex partner by distin- 
guishing between the supposed "target" and a compet- 
ing " rival" molecule that possibly features only a slightly 
different structure at the binding epitope. Therefore, an 
understanding of molecular recognition processes is obvi- 
ously not only interesting from a biological point of view, 
but also necessary for various biotechnological or phar- 
maceutical applications. The high specificity of molecu- 
lar recognition processes can be illustrated by the " lock- 
and-key" mechanism for inflexible biomolecules which de- 
mands a high geometrical complementarity for the two 
molecules forming a complex 0, Q]- For that reason, 
there is in general only one possible binding partner (say 
"key") for a given molecule ("lock"). As most macro- 
molecules prove to be flexible, the so-called " induced- fit" 
scheme has been established, according to which the nec- 
essary complementarity is only achieved after some con- 
formational changes of the corresponding backbones of 
the proteins [1]. 

The forces that stabilize a protein complex basi- 
cally emerge from a complicated interplay between non- 
covalent bonds. These bonds are characterized by ener- 
gies of the order of 2 — 6 kcal/mol Q. Since this is only 
slightly stronger than the thermal energy fcBrioom ~ 0.62 
kcal/mol at physiological conditions, we can conclude 
that the formation of a stable protein complex demands 
a large number of non-covalent bonds and thus many 
participating functional groups with appropriate comple- 
mentarity It has been investigated that the driving 
forces for molecular recognition are dominated by hy- 
drogen bonds and especially by the hydrophobic effect 



d 0, i, i, [T3l. The hydrophobic effect sums up the 
mechanism that the hydrophobic residues of proteins are 
effectively pushed together when the polar solvent leaves 
the space between the hydrophobic amino acids for en- 
tropical and energetical reasons . 

The enormous significance of water for biological sys- 
tems has been manifest for many years [12]. Although 
water is essential for the structure, stability, dynamics 
and functions of biomolecules, biological models often 
describe the solvent only as a passive component of the 
system as is done, for example, by referring to the hy- 
drophobic effect. However, it has been shown, that wa- 
ter molecules which are imbedded in cavities between two 
bounded proteins play a crucial role for the formation and 
stabilization of the complex and can thus be considered as 
an active part of the structure d, [H, [H, Qj., 16, 17|]. 
Indeed it has been observed that in interfaces between 
two proteins about 10-20 % of the area is made up of 
cavities on average of which a large number are filled by 
at least one water molecule [1, O, [l9|. The energetic 
contributions of the buried water molecules are basically 
twofold. They can either contribute van der Waals in- 
teractions with adjacent amino acids or form hydrogen 
bonds between constituents of the two proteins (some- 
times involving more than one buried water molecule). 
The latter possibility requires a high degree of geometric 
directionality of the involved molecules and parts of the 
proteins. The energetic contributions due to these medi- 
ated interactions are typically smaller by a factor of two 
or three than direct contacts, however, examples where 
they are of the same strength as direct contacts do exist 

Interfaces of protein complexes show different levels of 
hydration and can exhibit up to as many interactions 
caused by imbedded water molecules as by direct hy- 
drogen or salt bridges [3. On experimental grounds 
one can basically distinguish between "wet" interfaces 
with many imbedded water molecules and "dry" inter- 
faces where water is absent 0, [l^, [13]. Dry interfaces 
typically feature a ring of water molecules around the 
binding epitope. In general the less hydrophobic inter- 
faces between antibodies and antigens tend to be wet 
whereas the more hydrophobic protease-inhibitor inter- 
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faces appear to be dry. This suggests a correlation be- 
tween the hydrophobicity of the interface and the degree 
of hydration. Note however that exceptions to this broad 
rule do exist. 

In this article we will investigate the influence of buried 
water molecules in protein-protein interfaces on the se- 
lectivity of the corresponding recognition process. Our 
considerations are carried out within a coarse-grained ap- 
proach where the bulk solvent degrees of freedom are in- 
tegrated out. The energetics is then formulated on the 
level of amino acids in terms of the hydrophobic effect 
between residues of different hydrophobicity. Additional 
residual water degrees of freedom which are imbedded 
in the interface and can thus actively mediate interac- 
tions between amino acids are then incorporated into the 
model. From the point of view of modeling this can be 
done by applying direct and water-mediated contact en- 



ergies 15[ or by using generic double well potentials of 



mean forces with one minimum corresponding to direct 
contacts of two residues and a characteristic second one 
resulting from water-separated contacts [l^, [3 • We fi- 
nally remark that the problem of molecular recognition 
has been considered in coarse-grained approaches in sev- 
eral articles [11 il [H, H [H, [H, [13, [21, [H HO] ■ 

For our investigations we utilize a general two-stage 
approach (Sec. |TT| where in a first step an ensemble of 
probe molecules is designed with respect to a given tar- 
get. In a second step, we investigate the recognition abil- 
ity or selectivity of the probe ensemble by comparing the 
associated free energy for the two cases that the probe 
molecules bind the target or a different rival molecule, 
respectively. In the subsequent sections, we will modify 
the "elementary" hydrophobic-polar (HP) model by tak- 
ing the direct influence of single solvent molecules into 
account. Nevertheless, we have to keep in mind that the 
protein interaction with water is already part of the HP 
model since its energetics are based on the hydrophobic 
effect. In the following sections we analyses the influence 
of buried water molecules in the interface on the selectiv- 
ity of molecular recognition. Whereas in Sec. IIIII every 
cavity at the interface is filled by a water molecule, in 
Sec. IIVI we make the inclusion optional and addition- 
ally couple the water's interaction to the adjacent type 
of amino acid. In particular, we will investigate whether 
or not the inclusion of solvent molecules in the interface 
can lead to an enhancement of the selectivity. The tech- 
nical details how the selectivity for the model with an 
optional inclusion of water is calculated are discussed in 
the appendix. 



II. GENERAL APPROACH TO MOLECULAR 
RECOGNITION 



at the interface of a protein-protein complex as a two- 
dimensional array of N amino acids, also called residues 
or monomers. Typical values of N range between 30 and 
60 3. For the description of a so called probe molecule 
9, that is supposed to recognize a certain target molecule, 
we introduce the A^-dimensional vector 9 — {9i, . . . ,9^), 
whose i-th component indicates the type of amino acid 
on site i. Accordingly, the target molecule a is specified 
by its residues a = {ai, . . . ,(Tm)- For the sake of simphc- 
ity we assume that both proteins have the same number 
of monomers at the interface which match when forming 
a complex. We note, however, that systems where this 
assumption holds true do exist [Slj . 

To specify a single residue one should a priori dis- 
tinguish between the 20 different amino acids occur- 
ring in nature. In the coarse-grained approach of the 
hydrophobic-polar (HP) model, we reduce the alphabet 
of amino acids to only two letters and differentiate be- 
tween the polar and non-polar (hydrophobic) subgroup. 
Thus we get an Ising-like variable and choose the conven- 
tion to attribute to Ui or correspondingly 9i the value -1-1 
for a hydrophobic (H) and —1 for a polar (P) monomer 
at site i. We justify this procedure by having in mind 
that hydrophobicity acts as the dominant driving force 
in molecular recognition [2, [^, [lo| . Furthermore one 
gets the two amino acid subgroups as a very good ap- 
proximation by applying an eigenvalue decomposition of 
the Miyazawa-Jernigan matrix which consists of t he p air- 
wise interactions between all natural amino acids [32l.[33| . 
Note that there exist also other methods to reduce the al- 
phabet of amino acids to five clustered subgroups [3^.l35t. 

The induced-fit theory motivates us to account for mi- 
nor rearrangements of amino acid side chains which pro- 
vide the needed complementarity for the formation of 
a protein-protein complex. This feature is incorporated 
into the model by defining the quality of contact between 
the binding partners, labeled as S* = {Si, . . . , Sn). We 
just discriminate between "good" {Si — +1) and "bad" 
{Si = — 1) contacts at site i = 1, . . . , N. The (geometric) 
quality of the contact can be understood as a charac- 
teristic trait of one of the molecules or, alternatively, as 
a collective variable of the probe and target molecule. 
The contact variable sums up all geometric conditions 
at the interface, for example, the distances between op- 
posite residues or the alignment of their polar moments. 
Its relevance for the inclusion of water molecules at the 
interface is discussed in paragraphs IIIII and IIVI 

In our picture of the protein complex, we consider a 
general Hamiltonian Ti.{a,9;S) depending on the struc- 
tures a and 9 and some kind of interaction between bind- 
ing partners at position i which is related to the corre- 
sponding variable Si. We formulate the energetics at the 
interface by a modified HP model [26j : 



In this section we briefiy discuss how we model the 
recognition process and introduce a measure of its selec- 
tivit y ( more detailed accounts can be found elsewhere 
[H, m, [131) • We model a protein's recognition site 



N 



n{a,9;S) :=-e^i±^a,0,. 



(1) 



The parameter e > gives the strength of the hydropho- 
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bic interaction and is typically of the order of 2 kcal/mol 
[sH - Note that the factor e {0,1} suppresses the 

contribution of binding energy in the case of bad con- 
tacts. For a good contact at site i we receive the con- 
tribution —euiOi: If the type of residues of the protein 
interface in contact is identical, i.e. aiOi = 1, we will get 
a favorable term —e < 0, whereas for different types of 
amino acids the resulting -|-£ represents a non-favorable 
energy contribution. We note that HP-like models have 
been applied in various biophysical contexts over the last 
years [sl, [13, il 111 H EE E, El, El • 

To study the recognition process between the two 
biomolecules, we adopt a two-stage approach. In the first 
step, also referred to as the design step, we prepare an 
ensemble of probe molecules 9 which are supposed to rec- 



ognize a given and fixed target a 



(T) 



(T) 



For every possible configuration 9 of the probe molecule 
we therefore assign a conditional probability of its oc- 
currence PY){9\a^"^'>). We describe the conditions of the 
system by the Lagrange multiplier /3d > and demand 
a canonical Boltzmann distribution 



cxp 



-PBHia^^\9;S) 



(2) 
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where the partition function Zd guarantees the normal- 
ization J2{0} PDi^W-^^) = 1- The sum in ^ extends 
over all 2^ possible configurations of S. This design 
step has been introduced to mimic the process of evolu- 
tion in nature or design in biotechnological applications. 
We remark that the parameter /3d, which can be inter- 
preted formally as an inverse temperature in our simplify- 
ing view of evolution or biotechnological design, basically 
controls the degree of optimization of the probe with re- 
spect to the target As recognizing biomolecules are 
usually well optimized to each other we typically choose 
a fairly large value for /3d. 

In the second step of our approach, we test the recog- 
nition ability of the designed ensemble. To this end, we 
consider two copies of the ensemble of probe molecules 
at the inverse temperature /3 > 0: One ensemble is given 
the target molecule a'-'^^ the other system interacts with 
a competitive rival molecule cr'-^^ — {cr^^'\ . . . , fjj^^). At 
that point, we simulate that the probe molecules have 
to find their right partner and must decide between the 
formation of a complex with the target or the rival. Our 
aim is to calculate the free energies of the two possible 
protein complexes, and the lower one is then realized in 
nature. First we evaluate the free energy for the complex 
consisting of target or rival and a fixed probe molecule 9: 



F{9\a 



iln^exp 

^ {S} 



/37^(a("\0;5) , (3) 



for a € {T = target, R = rival}. Afterwards we average 
over the ensemble of probe molecules using the condi- 



tional probability from the design step and get 

F(") = ^ ^^(0|ct("))Pd(0|ct(t)). (4) 
W 

For further investigations we consider the difference of 
the free energy AF{a^'^\ a'^^^) = F^"^' — F^^' as a mea- 
sure for the selectivity of the recognition process. For 
AF{a'^^\a(^^) < ^ F^^) < F^^) the target is recog- 
nized by the probe molecules. 

Since we have decided to describe molecular recogni- 
tion on a very coarse-grained level it is quite natural 
that we will also average the difference in the free en- 
ergy AF(cr('^\ (T^^)) over all possible structures of target 
and rival molecules. Assuming a uniform probability dis- 
tribution for both the target's and the rival's structure, 
one receives a result (AF) that does not depend on spe- 
cific configurations any more. This number can be inter- 
preted as a characteristic selectivity of the model and its 
associated Hamiltonian. 

Let us end this section with a brief comment on the 
restriction of the contact variable Si to two distinct val- 
ues. At first glance, it might seem that the distinction 
between only good and bad contacts is too simple and 
naive. So one could suggest to consider a finite number 
of discrete levels that interpolate between the extreme 
case of a good and a bad contact. This modification ac- 
counts for the fact that there is usually a considerable 
number of possible alignments between the correspond- 
ing polar moments of opposite amino acids, for exam- 
ple. It turns out that the selectivity depends in general 
on the structural information contained in the variables 
cr*^"""^ and cr^^-' . However, different models of the contact 
variable do not change the corresponding functional de- 
pendence, although coefficients might be altered. Thus 
qualitative conclusions about the behavior of the selec- 
tivity remain the same. Therefore the simplifying reduc- 
tion to two different states of the quality of a contact 
suffices to describe molecular recognition in the context 
of the presented approach. One can show that the re- 
sult of AF(cr'^"'"^ , fj'^^) is even the same for non uniformly 
distributed (discrete or continuous) contact variables, as 
long as the distribution is symmetric with respect to the 
value lying in the middle between the values for good and 
bad contacts lisll. 



III. UNSPECIFIC INCLUSION OF INTERFACE 
WATER 

In this paper we are mainly concerned with the effect 
of imbedded solvent molecules at the interface of protein- 
protein complexes on the selectivity of molecular recog- 
nition. Our basic approach is based on the hydrophobic 
effect where bulk solvent degrees of freedom are already 
integrated out. The residual solvent degrees of freedom 
that show up at the interface as an active part have to be 
modeled explicitly. In our approach a solvent molecule 
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can be imbedded at a position where a bad contact ap- 
pears. Hence the contact variable Si describes the ap- 
pearance of cavities at the interface. In this section we 
will relate the emergence of cavities to thermal fluctua- 
tions. In the next section cavities will be modeled as an 
intrinsic geometric feature that is not liable to thermal 
fluctuations so that their number is fixed. 

Let us allow for an existing cavity to be always filled 
by a water molecule that interacts somehow unspecifi- 
cally with the adjacent amino acids so that the energy 
contribution does not distinguish between the types of 
the amino acids. This might be interpreted as a van der 
Waals contribution which has to be distinguished from 
a hydrogen bond that requires certain geometrical and 
structural prerequisites. To account for the geometrical 
conditions we consider favorable (—7 < 0) or unfavor- 
able (7 > 0) energy contributions and thus introduce the 
variable w = {wi, . . . , wn) with Wi € { — 1, 1} for the sol- 
vent degree of freedom to distinguish between a favorable 
{wi — 1) and an unfavorable {wi = — 1) energy contribu- 
tion. The Hamiltonian then consists of a sum due to the 
direct contacts at the proteins' interface as modeled in ([1]) 
and a second term due to the burial of water molecules 
at bad contact sites: 



nia, 9; S, w) 
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■E- 
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1 + 5. 
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N 

E 



Wi. (5) 



Consistent with observations (e.g. we request 

the ratio e/^ tohe typically of the order of two to three. 
For a good contact at site j so that water molecules can- 
not be imbedded, the variable Wj corresponds to a water 
molecule of the bulk and delivers an entropic contribution 
as it appears in the summation for the partition function 
but provides no energy contribution. 

We start with the design of the probe ensemble. The 
calculation of the conditional probability PY){d\a^"^'') — 
^ E{s} exp{-/3D:«(cr'^^ , 0; S, w)} gives 



exp(/3D£cr-^'6ii) + cosh(/3D7) 



4 cosh 



(t(e + 7)) cosh (^(e- 7)) 



N ■ 



(6) 

Before giving the result for the difference in the free 
energy we want to have a look at some observables of 
the system which characterize the design step. We de- 
fine the complementarity K of the target cr'"""^ and a 
certain probe molecule 6 as K ~ J2i=i'^i whose 
possible values range from —N to N. A value of 
K close to the maximum N means a high structural 
complementarity so that we expect the formation of a 
complex between target and probe to become energet- 
ically favorable. We can convert the probability ([6]) 
into a distribution for the complementarity according to 



Pu{K) = J2{e} PDi0\^^^'^)SK,^N_^^mg^ Using that re- 
sult to calculate an averaged complementarity of the de- 
signed structures 6 (for fixed target cr'"^^) according to 
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FIG. 1: Averaged complementarity jf{K) as a function of /3d 
for e — 2. The energy parameter 7 takes the values 0, 0.5, 1, 
1.5 (from the left to the right). Inset: Normalized number of 
cavities (L) as a function of /3d for the same parameters (7 
increases from the left to the right). 



{K) = TV 



jY K Py^{K)^ we finally arrive at 

sinh (%^) cosh (%^) 
cosh (^(e + 7)) cosh (^(e - 7) 



(7) 



Note in particular that the resulting expression for {K) 
is independent of the given target a^'^^ Equation ([T]) 
provides an interpretation for the design parameter /3d, 
since for large /3d — * 00 one gets {K) — > TV, i.e. the 
probe molecules are well optimized with respect to the 
fixed target and we thus talk of optimal design condi- 
tions. Further information that we can extract from ([7]) 
concerns the influence of the interaction between the pro- 
teins and the water, given by the parameter 7. As the 
complementarity is decreased for increasing 7 > 0, we 
can already expect the selectivity of the recognition to 
decay as well (compare figure [1]). 

Another observable of interest is the number L = 
~ Si^i "^i) of cavities at the interface. Instead of L 
we consider the normalized quantity 



1 

N 



N 



5:pd(%(^))(e^^ 

{0} 



(8) 



<t(t).6I> 



for a certain target a'^^K The pointed angles (•) in ([5]) 
denote a thermal average with respect to the fluctuating 
variables S and w, the indices indicate that the structures 
CT*^"^^ and 9 are kept fixed. The result for /^.(t) proves to be 
independent of the target's structure and shows also that 
the number of cavities increases with increasing 7 (com- 
pare figure [T]). This has been expected because for larger 
7 there can appear favorable contributions —7 which first 
of all require the existence of a sufficient number of cav- 
ities. 

For the analysis of the difference in the free energy of 
the interaction with the target and rival, we introduce 
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FIG. 2: The averaged selectivity (|10|l as function of 7 for 
e = 2 with /? = 1, /3d = 1 (upper curve) and /3 = 0.5, /3d = 1 
(lower curve). The dashed curve corresponds to the parame- 
ters /3 — 1, I3u ~ 0.5. An increasing strength of the interaction 
with the water molecules leads to a reduced selectivity. 



the function 



/3e 



In 



1 + exp(-/3£) cosh(/37) \ 
1 + exp(/3e) cosh(/37) J ^ 



and obtain the simple resuh 



(AF) 



2N 



(10) 



for the selectivity averaged over all possible target and 
rival structures. Note that j^{K){e,^ — 0;/3d) = 

tanh ^^^^^ and i3(e,7 = 0;/3) = 1. We compare the 

characteristic selectivity (AF) of this model with the 
unmodified case 7 = and realize that the selectivity 
decreases for increasing values of 7 as shovifn in figure [51 
To get a rough estimate of this reduction consider typical 
values of the interaction parameters. We assume a high 
degree of optimization during the design step and hence 
choose /3d to be typically larger than /3. For the selec- 
tivity shown in figure [2] we have chosen the parameters 
e = 2, /3 = 0.5, /3d = 1 and find that the selectivity is 
then reduced by 15% for 7 = 1. 

The burial of solvent molecules as modeled according 
to ([5]) thus does not lead to an enhancement of the se- 
lectivity. The primary reason for this is the thermally 
fluctuating number of cavities so that for increasing 7 
the system tends to exhibit a larger number of cavities 
so that beneficial direct contacts are reduced in the con- 
tribution to the selectivity. Only energy contributions of 
direct contacts, however, can discriminate between the 
differences in the structures of the recognition sites of 
the target and the rival. The energy contributions from 
imbedded solvent molecules are insensitive to those dif- 
ferences. For large 7 the free energy for the interaction of 
the probe with the rival becomes more similar to the one 
from the interaction with the target and hence selectivity 
is reduced. We will come back to this point at the end 
of section IIV Al 



IV. OPTIMAL HYDRATION OF GEOMETRIC 
CAVITIES 



In contrast to the previous model, we will now con- 
sider protein interfaces where the number of cavities is 
an intrinsic geometric constraint. Cavities appear in the 
interface as the roughness of the surface of the proteins 
might prevent a perfect fit of the shapes of the two pro- 
teins at some positions of the interface. For rigid proteins 
the roughness cannot relax and thus one expects the ap- 
pearance of a certain number of cavities irrespective of 
thermal fluctuations. Technically, the number of cavities 
is controlled by a Lagrange multiplier in our model. So 
the structure of the molecule is specified not only by the 
distribution of amino acids but in addition by a Lagrange 
parameter that contains information about the geometry 
of the cavities. In addition we allow the cavities to not 
necessarily be occupied with water molecules, i.e. a sin- 
gle cavity can, but does not have to be filled by solvent. 
We want to answer the question whether or not there 
exists a characteristic fraction of occupied cavities which 
leads to the maximum selectivity in the recognition pro- 
cess. This enables to distin guis h between wet and dry 
interfaces, as presented in [ol Il6l[20|. 

We want to consider a situation where the imbedded 
water molecules mediate interactions between the adja- 
cent amino acids. We therefore require the interaction of 
a water molecule to depend on the polarity or hydropho- 
bic! ty of the adjacent monomers and therefore introduce 
three different energy parameters 7pp > 7hp > 7hh- 
Here the parameter 7pp specifies the strength of the 
water-bridged interaction in a cavity with two adjacent 
polar residues (PP-cavity), the parameters 7hp and 7hh 
correspondingly the strength for HP and HH-cavities. 
The order of these parameters reflects the fact that wa- 
ter itself is polar and therefore the interaction with polar 
residues is more favorable. Besides, the new parameters 
have to be chosen in such a way, that the interaction 
strength of direct contacts e stays larger. The energetics 
of the mediated interactions are intended to mimic hy- 
drogen bonds between the amino acids that are bridged 
by solvent molecules. Note that in real interfaces these 
bridged hydrogen bonds can involve more than one water 
molecule [1^,131. We will, however, only distinguish be- 
tween filled and empty cavities, irrespective of the num- 
ber of contained water molecules. 

Let us now define the A^-dimensional vector / = 
(/i, . . . i/at), whose i-th component specifies whether a 
cavity at site i is filled by a water molecule (/i = 1) or 
not (/i = 0). As we want to consider interfaces with a 
fixed total number of cavities we adjust this number by a 
Lagrange parameter /i. In addition we consider the selec- 
tivity for varying numbers of imbedded water molecules 
and thus control the number of filled cavities technically 
by an additional Lagrange parameter ^. With the use of 
the abbreviations a := 7pp — 7hp, ^ ■— 7hh — 7hp and 
77 := 7hp + C the additional terms in the Hamiltonian 
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A: e = 2 7pp = 1 7hh = -1 7hp = 0.5 

B: e = 2 7pp = 1 7hh = -0.5 7hp = 0.5 
C: e = 2 7pp = 1 7hh = -0.5 7hp = 

TABLE L Investigated sets of energy parameters in 



that are related to the cavities are then given by 



N 



He 



1=1 

N 
1=1 



[<^S<T,-iS9i-i + ujS^.^iSe,,! +r]] 

(11) 



The contact variable Si thus models the appearance of 
real cavities. Apart form these contributions from solvent 
in cavities the total energy of the interface contains the 
usual contact Hamiltonian Ticont as modeled in ([1]) so 

that TC = Ticont + Ti-cav 

The strategy to calculate the selectivity for the above 
discussed model is outlined in the appendix. The La- 
grange parameters are used to fix the (normalized) num- 
ber I of cavities in the interface and the fraction / of 
cavities that are filled with water. We will utilize the nor- 
malization that / £ [0,1]- We thus obtain the selectivity 
(Ai^); (/) for interfaces with a fixed number of cavities 
as a function of the number of imbedded molecules. We 
note that the actual results, that are presented in the sub- 
sequent subsections, are obtained with a Mathematica 
program. 



A. Selectivity enhancement 

As we want to compare protein interfaces with imbed- 
ded water molecules, with the dry realization (/ = 0), we 
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The 



consider the correction factor C/(/) .— (a_f) (/=o) • 
range over / with C/(/) > 1 corresponds to increased 
selectivity of molecular recognition, whereas a correction 
factor with C/(/) < 1 describes lowered selectivity due 
to the inclusion of solvent molecules. Now we are inter- 
ested in the probability of the macroscopic realization for 
a wet interface, described by the parameters I and /, in 
contrast to a dry interface and obtain as a rough estimate 



Prob' 



with water) 



-/3{A-FU/)> 



Pj.Q]3(dry interface) g-/3(A_F,(/=0)> 



,7V(C,(/)-l) 



(12) 



To obtain an impression of the size of a possible en- 
hancement of the selectivity due to the inclusion of wa- 
ter we have to choose a characteristic set of the involved 
parameters. For the discussion we will consider inter- 
faces whose fraction of cavities varies from 10% to 30% 
{I — 0.1 . . .0.3) which seems to be reasonable for natu- 
ral protein- protein interfaces [3, [l^. In the following 
we will discuss the results for I = 0.3 and note that for 
I = 0.1 and / = 0.2 we obtain qualitatively similar re- 
sults. Concerning the energy parameters e, 7pp, 7hp and 
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FIG. 3: Analysis of C;=3o%(/) for the parameter set A and 
l3o — 1, l3 = 0.5, A*' = 32 (the inset shows parameter set 
B (lower curve) and C (upper curve), the dashed curve cor- 
responds to set C for /3d ~ 0.8 and 0.6 from above). The 
exactly averaged correction factor is shown together with the 
approximation discussed in the appendix (A^ — 32 and 64 
from above). The maximum at /opt ~ 0.085 leads to an en- 
hancement factor of 1.9 (compare relation H12|) '). Note that 
for the exact average HA.7|) a value for has to be specified. 
Different choices, however, show only very small finite-size 
variations. 



7hHj we will consider exemplarily three different combi- 
nations, denoted by A, B, and C as shown in tableHl For 
all combinations of parameters the inclusion of a water 
molecule in a PP-cavity is energetically most favorable, 
whereas the interaction of a water molecule with at least 
one polar residue in a HP-cavity is more beneficial than in 
a purely hydrophobic HH-cavity. Going from A to B we 
leave e, 7pp and 7hp unchanged, while the change of the 
parameter 7hh from —1 to —0.5 reduces the penalty for 
an inclusion of water between two hydrophobic residues. 
Accordingly, at the change from B to C, the occupation 
of water between different types of amino acids becomes 
less favorable. Furthermore, we set /3d = 1 as we want 
to have a high degree of optimization during the design 
and f3 — 0.5, satisfying the relation /3e = 0(1). 

The correction factor C/=3o%(/) for parameter set A 
and an interface with 30% cavity area is plotted in figure 
[3] for TV = 32. We were able to show in general, that 
the correction factor C;(/) features a characteristic max- 
imum for some value of /, say /opt, with C/(/opt) > 1 
which is lying somewhere in the allowed interval of /. 
The existence of a maximum for C/(/) means that there 
is a fraction of occupied cavities for which the selectiv- 
ity of the recognition process becomes maximum. For 
the considered parameters this typically results in an en- 
hancement of the selectivity for a hydrated interface by a 
factor of two to four using the estimate (|12p with = 32. 
Note that N 0(30) holds for typical interfaces in nat- 
ural protein-protein complexes [Oj, [10|. The presented 
example shows that, for a interface with 30% cavities 
roughly one third of the cavities should be filled with 
water molecules on average to give maximum selectivity. 
Interestingly, the selectivity first raises up to a maximum 
with Ci(/opt) > 1 and afterwards gets even smaller than 
one. 
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We now want to obtain a physical understanding for 
the evolution of the correction factor Ci{f) which can 
show both an enhancement and a reduction of the selec- 
tivity depending on the degree of hydration of the inter- 
face. To this end we consider various observables which 
characterize the interface between the probe molecule 
and the target molecule in more detail. The observables 
provide an answer to the question between which pairs 
of residues the cavities or the direct contacts are dis- 
tributed. We define the following quantities which are 
averaged over the ensemble of probe molecules: 



{6} 



and 



(13) 



(14) 



(15) 



These quantities specify how often a particular type 
of cavity is realized in the interface. We have chosen 
the index n'^^ = Nj^^/N because the obtained ex- 
pressions only depend on the target's hydrophobicity 
N^^^ — J2iLi ^^c^') 1- The formula for W^J(?) g is given 
by 

N 



u:=i / <j(T) fl 



for fixed target cr^"^-* and fixed probe molecule 6, similar 
definitions hold for W^t) g and g ■ The correspond- 

ing functions for the direct contacts are indicated by the 
letter D: 



and 



^7-)=E^rrT,.^D(0k(^)), 



(17) 



(18) 



(19) 



The definition of £'^(t) g is analogue to the previous func- 
tions depending on a fixed target and probe molecule: 




iV \^ 2 



(20) 
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FIG. 4: Analysis of the correction factor Ci^3o%(/) (shown in 
figure[3]) for the parameter set B and /3d = 1, /3 = 0.5, A'^ = 32. 



(a) observables of cavities W 



HP 



VK , (b) observables 



t(t),6I 



of direct contacts D^^ , D^^, D^^ . The observables are eval 
uated for n^' — 0.5 and are shown in dependence on the 
normahzed / (in %). 



In figure d] we have normalized the given observables 
to the sum of all direct contacts and to the sum of all oc- 
cupied cavities respectively. Since the observables have 
to be computed for a certain hydrophobicity of the tar- 

(T) 

get, we have chosen the typical value of njj =0.5. Note 

that the n^j^-* = 0.5 terms in (|A.7P dominate the sum and 
hence this also corresponds approximately to an average 
over all target structures for sufficiently large N (see dis- 
cussion in the appendix). For values of < / < 0.13 
we see that the fraction of favorable imbedded water 
molecules in PP-cavities increases. For a small fraction 
of filled cavities the solvent molecules will preferentially 
be imbedded in PP-cavities due to the large energy gain 
they can provide. This goes along with a weak decrease 
of direct PP-contacts. For a further increasing number 
of water molecules eventually all PP-cavities will be used 
up and water molecules have to go into the HP-cavities 
as they provide still an energy gain. The relative fraction 
of occupied PP-cavities therefore will be reduced for in- 
creasing /. This subsequent decrease of the PP-fraction 
goes along with a decreasing selectivity of the recogni- 
tion process. We notice that the observables in figured] 
take similar values for f — and / = /max = I though 
C/(0) = 1 is quite different from Ci{fi^a^) which can even 
be smaller than one. This demonstrates that the compet- 
itive influence of the rival on the selectivity gets more and 
more important for an increasing number of buried water 
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molecules. Note, however, that the selectivity does not 
change its sign, so we still have recognition of the target 
by the probe molecules. 

Looking at the observables that describe direct con- 
tacts (see figure |4]), we observe that they show only a 
weak dependence on /. The fraction of D^^ and 
strongly dominate the direct contacts between different 
types of amino acids. For / > we get > D^^ 

which can be explained in the following way: For an ex- 
isting site with opposite polar residues (PP) it is more 
beneficial to fill a cavity with water (in comparison to a 
HH-cavity), and therefore the HH-sites are more likely 
used for direct contacts between the amino acids. 

For all results shown in this subsection a high degree 
of optimization has been assumed (/3d = 1 in compari- 
son to /3 = 0.5). If the quality of the design is reduced 
by decreasing the parameter /3d the observed effect of an 
enhancement of the selectivity due to the inclusion of wa- 
ter molecules in the interface is still present, but becomes 
weaker and weaker (see inset of figure |3]). Even for a sit- 
uation with /3 = /3d = 0.5 selectivity enhancement due to 
hydration can appear although we note that this is not 
the case for all sets of the 7hh, 7pp and 7hp parameters 
(namely only for set C of the three considered ones). 

As a final comment let us come back to the situation 
where the appearance of cavities is due to thermal fluctu- 
ations and where different to the considerations in section 
|nT]the energy contributions from embedded water parti- 
cles now distinguish between the different types of amino 
acids of the cavities. This is technically incorporated 
if the Lagrange parameters in pT|) are set to zero and 
thus the number of cavities and the number of imbed- 
ded solvent molecules fluctuate. The cavity part of the 
Hamiltonian reads 



Hcnv = - ^ — 2^^'^ [aSa^^^iSei.-i + ujScri,iSe,,i + 7hp] 
1=1 

(21) 

where the parameter P specifies the relative weight of 
the direct contacts and the water-mediated interactions. 
Notice that different to ^ no distinction between a fa- 
vorable and an unfavorable energy contribution of an em- 
bedded water molecule is incorporated. The selectivity 
as a function of the parameter P is shown in figure [5] 
(compare also figure [2]). One finds that the inclusion of 
water molecules might lead to an enhanced selectivity al- 
though an enhancement of the selectivity is not observed 
for all considered parameter sets. So the distinction of 
the type of amino acids which are participating in water- 
mediated interactions is crucial for the appearance of an 
enhanced selectivity due to hydration. We also conclude 
that the appearance of rigid cavities seems to facilitate 
the enhancement of selectivity. 



-0.2 




FIG. 5: The averaged selectivity for the model (|2ip with a 
thermally fluctuating number of cavities as function of F for 
/3 — 0.5, /3d = 1 and the difi'erent relative adjustments of the 
parameters jpp, 7hh and 7hp as specified in table ID 

B. Dry and wet interfaces 

The last part of our investigation examines the influ- 
ence of the hydrophobicity of the interface on the en- 
hancement of the selectivity of the recognition process. 
In the previous subsection an average over all possible hy- 
drophobicities has been carried out so that the discussed 
results are general statements formulated for all classes of 
proteins and can be understood as a characteristic prop- 
erty of the considered model for molecular recognition. 
In nature, however, the hydrophobicity is typically dif- 
ferent for proteins that fulfill different biological tasks. 
For example, the average hydrophobicity of the interface 
of antigen-antibody complexes is relatively small (com- 
parable to the rest of the surface of the protein that is 
exposed to bulk water) whereas the interfaces of enzyme- 
inhibitor complexes are largely hydrophobic 0, S fiot - 
For this reason, we are also interested in an analysis of 
the free energy difference for a given class of proteins 
with fixed (averaged) hydrophobicity (njj ) of the target 
(see the appendix for the details how the corresponding 
correction factor Ci{{n'^^); /) is evaluated). 

We get the result that the correction factor 
C;((n^'); /) for given averaged hydrophobicity {n'^^) of 
the target molecules develops a characteristic maximum 
with C;((ny ')) > 1 for small hydrophobicities so that 
the selectivity is remarkably enhanced in comparison to 
the complex with a dry interface (see figure [5]) . For a 
protein-protein complex with a given small hydrophobic- 
ity of the interface the scenario of a dry interface is thus 
less favorable than the scenario with a hydrated interface. 
The position /opt of the optimum filling fraction for the 
class of proteins with a fixed hydrophobicity demands a 
shift from wet to dry interfaces when the hydrophobicity 
is increased as shown in figure [T] We note, however, that 
for complexes with large hydrophobicities the recognition 
is still selective. One also observes that the transition be- 
tween an optimal dry and wet interface depends on the 
chosen parameter values for the coupling constants. Our 
findings thus reproduce the empirically found correlation 
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that the degree of hydration at protein-protein interfaces 
decreases with the hydrophobicity of the interface (com- 
pare [i,[ii,[23). 




f 



FIG. 6: Correction factor Ci{{n\^'>)- f) as a function of the 
fraction / of occupied cavities for different fixed hydropho- 
bicities {n^^) ranging from 0.1 to 0.9 in units of 0.1 from top 
to bottom (parameter set A and I = 0.3). 
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FIG. 7: Position /opt of the selectivity maximum as a function 
of the hydrophobicity (n^') of the interface with 30% cavities, 
/opt ~ favors a dry interface, /opt = 0.3 corresponds to a 
maximally hydrated (wet) protein interface. 



We conclude this subsection by considering the modi- 
fication of the model pip with no discrimination of the 
type of cavity with respect to the energy gain when wa- 
ter is imbedded. In terms of the couphng parameters 
this means that we have 7pp — 7hp = 7hh = 7- Again 
we observe the appearance of a characteristic optimum 
fraction of occupied cavities which maximizes the selec- 
tivity such that C/((n[7^); /opt) > Ci{{n'^^); / = 0) = 1. 
However, if we again consider interfaces with a varying 
hydrophobicity {n'^'') of the target the position /opt of 
the selectivity maximum is not shifted as can be under- 
stood from the fact that the energy gain due to imbed- 
ding water molecules cannot resolve the hydrophobicity 
of the interface. Consequently no transition from a dry 
to a wet interface shows up for this modification of the 
cavity Hamiltonian. 



V. SUMMARY 

On the basis of coarse-grained modeling we have in- 
vestigated the influence of solvent molecules on molecu- 
lar recognition and found that they can provide an en- 
hanced selectivity. To describe the molecular recognition, 
we have adopted a two-stage approach containing a de- 
sign of probe molecules and a testing of their recognition 
ability. The energy that stabilizes the protein-protein 
complex is described in a coarse-grained view on the level 
of the hydrophobicity of the amino acids and the residual 
solvent molecules imbedded at the interface. 

We discussed a model with an inclusion of water 
molecules in every cavity at the interface without any 
coupling to the composition of residues of the two pro- 
teins. For all kinds of additional interaction strengths the 
selectivity of the recognition process is then decreased. 
The focus of our investigation was then set on the model 
with an optional inclusion of water molecules at the in- 
terface. Additionally the interaction of water depends on 
the adjacent types of monomers. Having fixed the aver- 
age number of cavities at the proteins' interface as an 
intrinsic geometric constraint we have found that there 
is a characteristic fraction of occupied cavities such that 
the selectivity becomes maximum. We showed that in 
many cases it is advantageous to have an occupied frac- 
tion in between 25% and 75%. The probability to have 
recognition of the correct target molecule is then typically 
enhanced by a factor of two to four. In addition we could 
establish a correlation between the degree of hydration 
of the interface and its hydrophobicity which naturally 
leads to a discrimination of dry and wet interfaces. We 
thus reproduce empirical findings for real protein-protein 
interfaces on the level of a coarse-grained model. We fi- 
nally conclude that imbedded solvent molecules have to 
be considered as an active part of molecular recognition 
processes and can considerably contribute to the selec- 
tivity. 



APPENDIX: EVALUATION OF THE 
SELECTIVITY 

In this rather technical appendix we outline the strat- 
egy to evaluate the selectivity of the recognition process 
where a specified fraction of cavities is filled with water 
molecules. The energy contributions at the interface are 
modeled by the Hamiltonian (jlip . 

Following the two-step-approach to obtain the se- 
lectivity, we first calculate the conditional probability 
PD(0|am) = ^E{5}E{/}exp{-/3DH(a(T),0;5,/)} 
in the design step. We emphasis that the Lagrange pa- 
rameters ^ = fj,{a^'^\9; Pb) and ^ = ^{a'^^\ 9; (3u) that 
have to be used for the design both depend on the struc- 
ture of the target a^'^^ and a certain probe molecule 9 
and the design conditions /3d- For each interaction of 
the probe with a molecule (target or rival) a different 
set of Lagrange parameters has to be specified in the 
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most general situation. At the transition to the test- 
ing step, we consequently need to introduce additional 
sets of different Lagrange multipliers corresponding to 
the interaction of the probe with both the target cr^"'"^ 
and the rival molecule cr^^-* at inverse temperature (3. 
However, this most general treatment is rather cumber- 
some. Instead, we fix the number of given cavities and 
the fraction of the occupied cavities at the design step 
and attribute this adjustment as an intrinsic geometric 
property to the probe molecules which is conserved at 
the testing step. In doing so only one set of Lagrange 
parameters is necessary in the testing step. This set is 
determined in the design and exhibits a dependence on 
the previously fixed structure of the target. The struc- 
ture of the probe molecule at the interface is thus speci- 
fied by the set {0,fi{a''^\e),£,{(T'^^\e)). The cavity part 
of the Hamiltonian for the testing step is hence given by 
PT|) with the set of Lagrange parameters obtained in the 
design step. 

Before we can evaluate the free energy difference we 
have to calculate the Lagrange multipliers /i and ^. 
Since the fixing of the expectation values of the nor- 
malized number of cavities I and the fraction of oc- 
cupied cavities / suffices to be softly implemented for 
the ensemble of probe molecules, we just regard the 
averaged quantities 1^(t) = 'J2{e}^am,e -Poidlcr''^^) and 

fam = J2ie}fam^9PB{dW^^'') where 



1x^/1-5. 



j(T) 



E 



(A.l) 



and 



(A.2) 



r(T), 



denote thermal averages in the design step (including 
the Lagrange parameters) with fixed cr'-"'"^ and 6. One 
can show that the analytically obtained results for ^^.(t) 
and f„(T) do not depend on the exact structure of a^"^^ 
but only on the target's hydrophobicity ' given by 
iVjj -iVnjj - d^(T) 1- 

The free energy turns out to be determined by 
the structural differences between the recognition sites 
of the target and the rival. To write the result 
of the free energy difference in a compact way we 
need to define quantities that specify the differences 
of the target and the rival molecules. We thus define 

X = E.=i^^ma^.p',-i e {0,...,iv4^'} and Y = 
E,=i '^.f ),-i'5,f ),i e {0, . . . , iV - N^^}. The free en- 
ergy difference for a given target and rival structure is 
then given by 



AF{a 



(T) ^(R) 



) = -i Bia, uj)X-^ Bito, a) Y, (A.3) 



where we have introduced the function B{a^uj) 
B{a, uj) 



GD(e,-)lnf£|^ + GD(-£,0)ln^ 



4g2/3DAt cosh(/3De) + 2 -f e'^o'' (1 -f e^'^) 

(A.4) 



with 



(A.5) 



and similarly 

Gd(x, y) = 2e''°^+2/3DM + i + f,0u{v+v) ^ (a.6) 

Note that the auxiliary function B{a,uj) implicitly de- 
pends on the structure ct^"'"^ of the target through the 
dependency on the Lagrange parameters. As already 
mentioned above, this dependence is, however, reduced 
to a dependence on the hydrophobicity of the tar- 

get, that is B{a^Lo) = B{a,uj] n'^'^). For this rea- 
son, averaging over all possible structures of the tar- 
get and rival molecules will "only" demand the com- 
putation of 0{N) terms instead of an explicit eval- 
uation for all 2^ configurations. Using the expres- 
sions for ^^(t) = /^(t)(A^jj ^ ^^(t) , ^^(t) ) and /^(t) = 

('~r\ 

/a(T)(A^H '^Ar4'r''?Ar4'r')' '^^^ ^cr(T) and /^.(t) to 
some desired numbers I and /, respectively, and get nu- 
merically the values of fJ^^m and ^ 



Instead of computing A_F((t'"'"-' , cr'^^^ ) for a specific con- 
figuration of {a^'^\ a^^'') — or due to the sole depen- 
dence on the hydrophobicity, for a given combination 
{N^ , A^jj ) — we average over the ensemble of probe 
molecules which leads to the expression {AF)i{f), de- 
pending on the fixed (average) number of cavities I and 
the occupied fraction /. The possible values of / are ex- 
trapolated to the real interval [0,/]. The expression for 
{^F)i{f) is given by 

N 

(AF)K/) = SiNP;lJ) (A.7) 



with 



X=0 Y=0 



and 



(A., 



B{a, uj; N^'> )X + B{lu, a; N^'>)Y 



(A.9) 

For the summation over the macroscopic parameters 
, X and Y the corresponding degeneracy (density) 



N^h (N - n'^^^ 



Y 



(A.IO) 



11 



of microscopic configurations a^'^^ with respect to the 

(T) 

hydrophobicity has to be taken into account. Us- 

ing the selectivity (jA.7P we consider the correction factor 

:= (AF^l^/=o) which relates the probability for hav- 
ing a hydrated interface with / 7^ to the one for a dry 
interface with / = (see section |IVA[) . 

For sufficiently large N a good approximation for 
(AF)j (/) can be obtained if we estimate the sums 
in (|A.7|) by evaluating the strongly peaked function 
n{NP,X,Y) at its maximum r2(f ,f This fact 

may facilitate future calculations, since for in the pre- 
sented context {N w 30 . . . 60) there is almost no differ- 
ence between the exact and the approximated results. In 
figure [3] the correction factor C;(/) which is discussed in 
subsection IIV Al is shown for the exact average together 
with the approximation. 

The selectivity (jA.7| involves an average over all target 
structures which are equally likely (expressed in terms of 
an average over the corresponding hydrophobicities). For 
the investigation of the optimal degree of hydration we 
are also interested in an analysis of the free energy differ- 
ence for a given fixed (averaged) hydrophobicity (n^^) 
of the target (see section |IVB|) . To this end an addi- 
tional Lagrange multiplier ( that controls the hydropho- 
bicity of the target molecules has to be introduced. Note 
that similarly the hydrophobicity of the rival has to be 
fixed by a Lagrange parameter. As long as we choose 
the target and the rival to have the same hydrophobicity, 
however, the results discussed below will not depend on 
the hydrophobicity N^^ of the rival. Hence, we replace 



the probability oc (^(t)) for a configuration to have the 
hydrophobicity Nj^ by the modified probability 



(l + exp(-C))^V^H 



(A.ll) 



which can be used to express C, in terms of a given {n\^ ) . 
Using this probability finally leads to a modified correc- 
tion factor Ci{{rL^ )i f) for given averaged hydrophobic- 
ity (n^'') of the target molecules: 



N, 



N 



(A.12) 

where S{nP;IJ) is the function ([XS]) of and 

a,/)- 
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